Syntactic annotation of medieval texts: the Syntactic Reference Corpus of Medieval French (SRCMF)
نویسندگان
چکیده
This article presents the Syntactic Reference Corpus of Medieval French (SRCMF). The corpus is composed of texts taken from the two major Old French corpora, the Base de Français Médiéval and the Nouveau Corpus d'Amsterdam. This contribution describes some of the core principles of the annotation model, which is based on dependency grammar, as well as the annotation procedure and representation formats.
منابع مشابه
Building the Syntactic Reference Corpus of Medieval French Using NotaBene RDF Annotation Tool
In this paper, we introduce the NotaBene RDF Annotation Tool free software used to build the Syntactic Reference Corpus of Medieval French. It relies on a dependency-based model to manually annotate Old French texts from the Base de Français Médiéval and the Nouveau Corpus d’Amsterdam. NotaBene uses OWL ontologies to frame the terminology used in the annotation, which is displayed in a tree-lik...
متن کاملOld French Dependency Parsing: Results of Two Parsers Analysed from a Linguistic Point of View
The treatment of medieval texts is a particular challenge for parsers. I compare how two dependency parsers, one graph-based, the other transition-based, perform on Old French, facing some typical problems of medieval texts: graphical variation, relatively free word order, and syntactic variation of several parameters over a diachronic period of about 300 years. Both parsers were trained and ev...
متن کاملMuch Ado About Nothing? On the Categorial Status of et and ne in Medieval French
When syntactically annotating a text corpus from an earlier stage of some language, one is confronted with the task of determining the categorial status of the elements encountered. This task can become arduous when some element seems to resist a clear-cut categorial assignment. In this case, one sees oneself in principle confronted with the choice between a 'consistent' approach (assignment of...
متن کاملSyntactic Complexity of Russian Unified State Exam Texts in English: A Study on Reliability and Validity
In this study we analyze texts used in Russian Unified State Exam on English language. Texts that formed small research corpora were retrieved from 2 resources: official USE database as a reference point, and popular website used by pupils for USE training “Neznaika” (https://neznaika.pro/). The size of two corpora is balanced: USE has 11934 tokens and “Neznaika” - 11918 tokens. We share Biber’...
متن کاملProsody in a corpus of French spontaneous speech: perception, annotation and prosody ~ syntax interaction
Our study focuses on the issue of prosodic annotation and of the prosody ~ syntax interface in conversation and is based on a large corpus of conversational speech in French. The results of inter-transcriber agreement tests show that two expert transcribers are consistent in their labeling of prosodic phrasing and the consistency is well above the chance. A qualitative analysis reveals transcri...
متن کامل